IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



Applicants: Barfaietal. Confirmation No.: 4463 

Serial No.: 10/028,525 Group Art Unit: 2113 

Filed: Oct. 25, 2001 Examiner: Joseph D. Manoskey 

Title: Critical Adapter Local Error Handling 



CERTIFICATE OF MAILING 

I hereby certify that this correspondence is being deposited 
with the U.S. Postal Service as first class mail in an envelope 
addressed to: Commissioner for Patents, P.O. Box 1450, 
Alexandria, VA 223 13-1450, on January <g?3, 2006. 

. Becker 

Date of Signature: January c?3 , 2006. 



To: Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



Declaration Under 37 CFR 8 1.131 



We the undersigned, Dawn Moyer, Robert Bartfai, John Doxtader and Leroy Lundin, declare 
the following to be true, to the best of our knowledge and recollection: 

1. that we are the inventors of the subject invention disclosed and claimed in the above- 
identified patent application, except for Nick Rash another inventor who is deceased; 

2. that we were employed by International Business Machines Corporation in New York at 
the time of the subject invention; 
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3. that the subject invention described in the above-mentioned application was conceived and 
reduced to practice in the United States prior to July 17, 2001; 

4. that the subject invention was embodied in code that was based upon an approved 
component design document which was also supplied to an independent (within the 
International Business Machines Corporation organization) Functional Verification Test 
(FVT) group whose responsibility was to exercise the functionality specified in the 
component design document; 

5. that the subject code embodying the subject invention was actually approved by the 
Functional Verification Test group prior to July 17, 2001, and then was also subsequently 
passed on to a separate System Test group whose responsibility it was to insure compatibility 
within the larger system environment and with other running unrelated code packages, all of 
this occurring before July 1 1 , 2001 ; 

6. that the subject code embodying the subject invention was "closed** by the System Test 
group prior to July 17, 2001, meaning that the subject code had completed all of the testing 
phases required by the International Business Machines Corporation; 

7. that, in accordance with International Business Machines Corporation software release 
procedures, code is not released for general availability prior to full and complete testing by 
the System Test group; 

8. that code embodying the subject invention was announced as being "generally available" 
(in accordance with the same meaning given that phrase in recitation #7 immediately above) 
before the end of the year 2000; 

9. that included herewith is a copy of the above-mentioned component software design 
document that was used to implement the features of the subject invention; 

10. that all files, functions and their associated functionality recited in the claim of the 
present application were designed, implemented and operative to support Feature #47587 and 
are documented in the approved component design document; and that the component design 
document contains the ioctl names and not the actual function and file names, which are left 
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up to the code developers. (For instance, cadd_adapter_start in file start.c is documented 
as ioctl, ADAPTERJ5TART. The ioctls are used by the FSD (Fault Service Daemon) 
whereas direct function calls are used by the device driver. The ioctl is merely a wrapper for 
user space code to access the hardware but it calls the exact same functions.); 

1 1 . that, with respect to the first recited step of claim 1 (that is, "detecting a nonpermanent 
error condition, within an adapter connected to one of said nodes, from which recovery is 
possible from within the node connected to said adapter"), this step was implemented as 
follows: the detection of a potential error condition is identified via error class masks that 
correspond to each specific error interrupt register on the adapter. These masks classified 
which bits were treated as possible recoverable critical adapter errors. The FLIH (first level 
interrupt handler) and SLM (second level interrupt handler) functions caddjntr and 
cadd_intr_offlvl in file cadd.intr.c respectively on the affected node classified and 
responded to all hardware error interrupts generated by the adapter. Eventually this 
classification and HW error registers were passed along to the FSD (Fault Service daemon) 
for further actions; 

12. that, with respect to the second recited step of claim 1 (that is, "suspending 
communications from within the node with the adapter affected by said error condition"), this 
step was implemented as follows: Communications are suspended from within the node via 
the suspension of all existing open windows and rejecting any new window opens on the 
adapter experiencing the critical adapter error condition. A CSS JSUSPEND-WINDOW 
event is posted to all registered window owners (HAL and IP (Hardware Abstraction Layer 
and Interface Protocol)) which leave the windows open, drop packets and return successfully 
on reads and writes. There is no explicit notification to the protocols (Application Program 
Interfaces - APIs) that the window resources are no longer available. This notification 
caused the protocols to terminate the running applications 100% of the time. The FLIH 
function cadd_intr in file cadd_intr.c used the cadd_suspend_windows function in file 
caddJntr.c to suspend communications within the node without termination of any running 
applications. HAL (Hardware Abstraction Layer) used function _col_suspend_win in file 
cole and IP used function ifcI_suspend_window in file ifcl_cfg.c to take appropriate actions 
on the posted suspend event from the FLIH; 
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13. that, with respect to the third recited step of claim I (that is, "disabling communication 
between said affected adapter and said switch so as to provide an indication to at least one 
other node in said network that communication with said affected adapter is at least 
temporarily suspended so as to effectively cause suspension o£ but not termination of, 
applications running on said at least one other node in said network"), this step was 
implemented as follows: The SLIH (Second Level Interrupt Handler) for critical adapter 
errors on the affected node resets the affected adapter to clear up any possible hang 
conditions. Resetting the adapter resulted in disabling the link (link no longer timed) which 
in turn caused link sync failures to be raised to the FSD switch recovery code running on the 
Primary node (FSD central point of control). Switch recovery suspended thresholding of link 
sync errors for 10 seconds. After 10 seconds, switch recovery continued to count link syncs 
within a specific time period for thresholding purposes. If the recovery actions on the 
affected node did not re-enable the link within the 10 second recovery window, the FSD 
switch recovery on the primary node "thresholded" and fenced the affected adapter off the 
switch. This fence (adapter recovery failure path) released all window resources on the 
affected adapter which resulted in the termination of running applications using these 
resources. The SLIH disabled communications between the affected adapter and the primary 
FSD node via the function cadd_adapter_reset in file reset.c. The FSD switch recovery 
suspended thresholding and fenced the adapter upon meeting the link sync threshold via 
function CS JSwitch_error jrecovery in file CSrecovery.c. Resources were released via the 
existing FSD base function cadd_adapterResooreeRelease in file caddjmth.c. 

14. that, with respect to the fourth recited step of claim 1 (that is, "performing recovery 
operations, at said affected node, to restore operation of said affected adapter, based on said, 
detected error condition, said recovery including enablement of said disabled 
communication"), this step was implemented as follows: The FSD adapter function 
fs_daemon_fsm_adapter_thread jmain in file fsd_fsm_adaptc started the adapter which 
re-enabled the link via function cadd_adapter_start in file startc. 

15. that, with respect to the fifth recited step of claim 1 (that is, "resuming communication 
with said affected adapter upon enablement of said disabled communication"), this step was 
implemented as follows: The FSD resumed communication on the affected adapter after 
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successful recovery actions by resuming the suspended windows on the adapter via 
cadd resume windows in file cadd_auth.c. 

16. that the functions, calls, files and processes described above are found in the included 
component design document, subject to the naming clarification set forth in Item No. 10 
above. 

17. that an examination of the records contained within a permanent file designated CMVC 
(Configuration Management Version Control) indicates that the subject feature was 
completely tested and approved by the FVT team prior to July 17, 2001 and that this record 
contains the following information: 
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We hereby declare that all statements made herein of our own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made 
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are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 



Respectfully submitted, 



Date Dawn S. Moyer 



Date Robert F. Bartfai 




Date Leroy R Lundin 
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